Multi-document multilingual summarization corpus preparation, Part 1: Arabic, English, Greek, Chinese, Romanian
نویسندگان
چکیده
This document overviews the strategy, effort and aftermath of the MultiLing 2013 multilingual summarization data collection. We describe how the Data Contributors of MultiLing collected and generated a multilingual multi-document summarization corpus on 10 different languages: Arabic, Chinese, Czech, English, French, Greek, Hebrew, Hindi, Romanian and Spanish. We discuss the rationale behind the main decisions of the collection, the methodology used to generate the multilingual corpus, as well as challenges and problems faced per language. This paper overviews the work on Arabic, Chinese, English, Greek, and Romanian languages. A second part, covering the remaining languages, is available as a distinct paper in the MultiLing 2013 proceedings.
منابع مشابه
MultiLing 2013 MultiLing 2013: Multilingual Multi-document Summarization
This document overviews the strategy, effort and aftermath of the MultiLing 2013 multilingual summarization data collection. We describe how the Data Contributors of MultiLing collected and generated a multilingual multi-document summarization corpus on 10 different languages: Arabic, Chinese, Czech, English, French, Greek, Hebrew, Hindi, Romanian and Spanish. We discuss the rationale behind th...
متن کاملMulti-document multilingual summarization corpus preparation, Part 2: Czech, Hebrew and Spanish
This document overviews the strategy, effort and aftermath of the MultiLing 2013 multilingual summarization data collection. We describe how the Data Contributors of MultiLing collected and generated a multilingual multi-document summarization corpus on 10 different languages: Arabic, Chinese, Czech, English, French, Greek, Hebrew, Hindi, Romanian and Spanish. We discuss the rationale behind th...
متن کاملMulti-document multilingual summarization and evaluation tracks in ACL 2013 MultiLing Workshop
The MultiLing 2013 Workshop of ACL 2013 posed a multi-lingual, multidocument summarization task to the summarization community, aiming to quantify and measure the performance of multi-lingual, multi-document summarization systems across languages. The task was to create a 240–250 word summary from 10 news articles, describing a given topic. The texts of each topic were provided in 10 languages ...
متن کاملMultilingual Single-Document Summarization with MUSE
MUltilingual Sentence Extractor (MUSE) is aimed at multilingual single-document summarization. MUSE implements a supervised language-independent summarization approach based on optimization of multiple sentence ranking methods using a Genetic Algorithm. The main advantage of MUSE is its language-independency – it is using statistical sentence features, which can be calculated for sentences in a...
متن کاملCLASSY Arabic and English Multi-Document Summarization
Our Multilingual Summarization Evaluation entries for MSE-2006 were based upon an improved version of our CLASSY (Clustering, Linguistics, And Statistics for Summarization Yield) system. Our two entries were systems 20 and 21 and represented approaches based upon extracts from a) only English documents and b) English and the translated Arabic documents (full clusters). This paper presents a bri...
متن کامل